Is biomedical research self-correcting? Modelling insights on the persistence of spurious science

The reality that volumes of published biomedical research are not reproducible is an increasingly recognized problem. Spurious results reduce trustworthiness of reported science, increasing research waste. While science should be self-correcting from a philosophical perspective, that in insolation yields no information on efforts required to nullify suspect findings or factors shaping how quickly science may be corrected. There is also a paucity of information on how perverse incentives in the publishing ecosystem favouring novel positive findings over null results shape the ability of published science to self-correct. Knowledge of factors shaping self-correction of science remain obscure, limiting our ability to mitigate harms. This modelling study introduces a simple model to capture dynamics of the publication ecosystem, exploring factors influencing research waste, trustworthiness, corrective effort and time to correction. Results from this work indicate that research waste and corrective effort are highly dependent on field-specific false positive rates and time delays to corrective results to spurious findings are propagated. The model also suggests conditions under which biomedical science is self-correcting and those under which publication of correctives alone cannot stem propagation of untrustworthy results. Finally, this work models a variety of potential mitigation strategies, including researcher- and publisher-driven interventions.


Introduction
Biomedical science is integral to human prosperity.Diligent research keeps us healthy, functioning as a compass towards a better future for humanity.Yet despite its critical importance to our collective well-being, there are serious issues with how biological science and medicine are currently practised.This was illustrated starkly by the ongoing COVID-19 pandemic, which saw an explosion of COVID-related research.This uptick in publication not only highlighted how science is practised, but raised concerns over its failings.In a short period, much COVID research was subsequently shown to be of low methodological quality, and unsuitable for drawing inferences [1][2][3][4][5][6].Some of these failings were high-profile enough to alter public opinion and even endanger public health.In 2020, Donald Trump proclaimed hydroxychloroquine a panacea for COVID-19 based on flawed analysis, and later works extolling anti-parasitic medicine Ivermectin for COVID ignited world attention, persisting despite later analysis exposing them as unreliable.Consequently, millions worldwide embraced ineffectual medicines with deleterious consequences, spending hundreds of millions of dollars on an unsuitable treatments with a potential of harm [7,8].
Irreproducible research was already a problem long before the dawn of COVID.Swathes of ostensibly important biomedical results simply do not withstand deeper scrutiny [9].In fields as diverse as psychology to genomics to cancer research, there has been increasing recognition that we are experiencing a wide-scale replication crisis, where seemingly important results do not hold under inspection.This is an especially pronounced problem in medicine, when life-or-death clinical decisions might pivot on research outcomes.There are many culprits for this state of affairs.Inappropriate statistical practices and manipulation underpin many spurious results [10][11][12][13][14], affecting up to 75% of biomedical publications [15][16][17].In fields as vital as cancer research, only 11%-50% of preclinical results are deemed replicable [18,19].This is compounded by publication bias, where high-prestige journals fixate upon novel 'positive' results over more diligent null results.This has long been a problem, but one which has only increased with time-in 1991/1992, positive results constituted 70.1% of scientific publications.By 2007, this had grown to 85.9%, and it continues to grow [20].In some fields, positive finds constitute over 95% of findings [21].The metric-driven 'publish-or-perish' environment in which scientists and physicians inevitably operate is also a factor, as this inadvertently privileges dubious findings [22], dubbed the natural selection of bad science [23].Modelling studies also suggest that scientists acting rationally to maximize career value of their publications leads to a high rate of false positives [24].Data and methods are often jealously guarded, making it difficult to spot methodological errors or even verify assertions made in published literature.In many instances, even seemingly robust results transpire to be fragile and not robust [25,26].
Publication bias also means that for many scientists, there is little reward to be had in pursuing corrective endeavours, and only minor benefit in disseminating null results.Consequently, replication efforts are rarely undertaken and null results are often buried and not shared for publication.This 'file-drawer' problems means that important and diligent negative findings are often not submitted for publication, while spurious findings are unduly favoured [27][28][29][30].This is on the face of it a strange state of affairs-it is every bit as critical to know a particular treatment had no effect as to know it has one, but only the latter is highly valued.While modelling suggests this situation inherently privileges careless or unethical researchers over more diligent ones, if science is ultimately self-correcting, all such errors will eventually be remedied.Self-correction itself lies at the heart of philosophy of science, considered a vital hallmark of scientific practice.Science by definition pivots on empirical observations, and holds its results to perpetual scrutiny.With the passage of time, a spurious result will fail replication, and the scientific record will be corrected.This is perhaps true on a long enough timescale, but it does not take into account the substantial obstacles impeding correction.Science is an enterprise conducted by humans, with all of their incentives and failings.If positive results are disproportionately rewarded while replication and null findings systematically impeded, than vital corrective efforts will be stunted.In recent years, many authors have questioned whether our current paradigm truly enables self-correction.This has been particularly examined in the context of psychology, one of the first major fields to identify a replication crisis and examine its impacts objectively [31,32].In other fields, there is marked reticence from journals to correct flawed works [33].One recent review noted that biomedicine as a field was severely lacking in appropriate levels of transparency and, crucially, critical appraisal.The authors' conclusion was that fields without transparency and robust, verifiable mechanisms for critical appraisal 'cannot reasonably be said to be self-correcting, and thus do not warrant the credibility often imputed to science as a whole' [34] .
Accordingly, science as it currently operates is not always self-correcting, causing harm to both science and society [35].Scientific practice is also frequently to blame.Examining the history of medical research, Doug Altman and Iveta Simera detailed how poor experimental design and subpar reporting has been a longstanding problem, citing Torald Sollmann, who as early as 1917 noyed that the 'quality of published papers is a fair reflection of the deficiencies of what is still the common type of clinical evidence.A little thought suffices to show that the greater part cannot be taken as serious royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 231056 evidence at all' [36].Despite high-profile expert guidelines on research conduct, many published papers still demonstrate substandard reporting and methodological issues [37,38].Spurious results can arise as artefacts of false positive results, from poor experimental design, or even deliberate fraud.Untrustworthy findings that do not withstand closer inspection are ultimately research waste, contributing to noise without benefiting patients or the research community [39].By some estimates, research waste is considered to constitute up to 85% of biomedical science output [40].Even when retracted, these nowphantom publications can remain highly cited in the literature long after their credibility has decayed [41][42][43][44][45][46][47], and citations to retracted publications are rarely critical [48,49].
While the problem of self-correction has begun to garner significant attention, there is as yet no means to quantify the likely corrective effort required to sustain self-correction, nor the volume of research waste likely to be produced in a given biomedical field.There is no literature to the author's knowledge that estimates the likely time taken to correction, nor the situations where self-correction may break down.This is not surprising, as the publishing ecosystem in which science operates is complex and multifaceted, and direct measurement of relevant factors is difficult or impossible.Without this knowledge, however, we cannot realistically estimate the extent of problems, nor even optimally gauge potential strategies to mitigate the problem.Modelling provides model insight for such problems, and previous investigations have investigated how pressures on scientists can lead to spurious findings [22][23][24], but there have been few direct investigations into how the scientific publishing ecosystem itself might modulate this problem, and what factors might ameliorate or exacerbate research waste.In this work, we take a novel mathematical modelling approach, crafting a simple dynamic model of scientific publishing and self-correction.The model derived allows to estimate the likely impacts of different factors on scientific trustworthiness, research waste, correction time and corrective effort, subject to reasonable field-specific assumptions.The net advantage is that this not only allows us to make quantified predictions about the dynamics of scientific research, it also yields insight into possible strategies to mitigate spurious research, the ramifications of which are developed and discussed.

Model outline
To investigate factors shaping the research ecosystem, we consider a simple 'gold rush' model, where spurious results, x(t), beget consequent spurious results as other groups or the initial investigators pursue similar avenues.This increases the volume of untrustworthy literature, but eventually spurs other investigators to examine these claims and publish contrary literature from more diligent experimentation, y(t).The growth rate of consequent spurious claims is g.The rate at which corrective literature emerges based on the prevalence of suspect literature is e, and corrective attempts have a relative impact of d.These constants subsume different quantities, and will be defined shortly in context.The relationship between suspect and corrective literature is thus @x @t ¼ gxðtÞ À dyðtÞ ð 2:1Þ and @y @t ¼ exðtÞ: ð2:2Þ This coupled first-order differential equation system can be explicitly solved, subject to the initial condition x(0) = x o .Corrective efforts emerge after a time t d from the initial erroneous publication, so that y(t ≤ t d ) = 0, to account for the reality that there can be latent periods before spurious claims are challenged.This system can be solved to yield closed-formed analytical solutions for x(t) and y(t) of and royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: We further implement realistic parameters for g, d and e.For a field with average research rate of p r per unit time, the rate of spurious positive results depends upon the field's false positive rate f p and the publication bias of a family of relevant journals, B, or the fraction of significant findings published (relative to null findings) in field-specific journals.Assuming all positive results are submitted for publication, then g = p r B f p .Conversely, true null findings occur at a rate e is 1 − f p , with publication bias 1 − B for null findings.We account for the file drawer problem where only a fraction of null results are submitted for publication by introducing the submitted fraction, s r .It follows that e = p r (1 − B)(1 − f p )s r .Finally, we consider how impactful null results are relative to positive findings, given by d = kp r , where k is a constant value.When k = 1, corrective publications have equal impact to the publication ecosystem as spurious findings.As even retracted publications continue to be cited, this suggests that in general, k < 1.When 4de < g 2 , corrective literature will eventually nullify spurious findings when x(t c ) = 0, yielding 2 arctanðÀh=gÞ h : ð2:5Þ When g 2 ≥ 4de, there is no positive real number solution for t c .This corresponds to the scenario where corrective literature cannot remedy the propagation of a spurious finding-a situation where science by conventional publication schema is not self-correcting.While there are various definitions for what constitutes research waste, we will for the purposes of this work define research waste as the total spurious publications produced in the interval to t, or r w ¼ Ð t 0 xðtÞ dt and corrective effort to nullify spurious findings as c e ¼ Ð t 0 yðtÞ dt, yields and Solving these identities at t = t c (when defined) yields the total spurious publications and corrective effort required to correct a spurious finding.It is worth noting that the model explicitly assumes that corrective efforts can nullify spurious findings, but this would not hold if volumes of evidence were insufficient to remove falsified ideas from the scientific canon, a point raised further in the discussion.For brevity, we designate r w (t c ) as r w and c e (t c ) as c e unless otherwise specified.Summary descriptions of the model terms are given in table 1.

Simulated scenarios
With the model established, we simulate the following scenarios: 1. Impacts of field-specific false positive rates and corrective time delay on result trustworthiness, correction time and research waste ( publishing ecosystem).To investigate this, we simulate the Finally, publishers themselves have substantial scope to influence the research trustworthiness of the field through their editorial policy.In this simulation, we examine the impact of removing publication bias.In this case, there is no incentive to submit positive results.Accordingly, in this hypothetical scenario, s r = k = 1, g = p r f p and e = p r (1 − f p ).This can be simulated for varying values of false positives f p and time to corrective action t d to gauge the scope of such policy.

Impacts of field-specific false positive rates and time delay on result trustworthiness
Figure 1 depicts the impact on the publication ecosystem of an initial spurious (x o = 1) result in fields with low ( p f = 0.05) and high ( p f = 0.20) false positive rates, with simulation parameters of p r = 3 submissions a year, s r = 0.5 (half of all null results submitted).As the proportion of positive results published in many fields approaches near unity in many fields in recent years [21], and we assume initially for illustrative purposes that 95% of published findings are positive, or B = 0.95, and k = 0.6.Results are given in table 2.
As expected from equation (2.5), t c increases with false positive rate and research waste is highly dependent on time delay until corrective action, t d .Figure 2 shows contour plots for both corrective time t c and the volume of research waste produced, r w .In all cases, r w > c e so it always takes a lesser volume of corrective work to nullify spurious results, even when k < 1.When g 2 > 4de, there is no real value for t c , corresponding to a situation where spurious work cannot be remedied by corrective publication.This would occur in this example when f p = 0.25, where spurious publications are irreversible by correction, illustrated in figure 3.

Hypothetical impacts of eliminating publication bias on research waste and result trustworthiness
With the hypothetical removal of publication bias, figure 5 depicts the likely impacts at varying levels of false positive results, with selected values given in table 3. It is immediately apparent that relative to the current paradigm as illustrated previously, correction time, research waste and corrective effort are all vastly reduced.It can be readily known that were publication bias eliminated, the solution for 4de − g 2 ≥ 0 yields p f ffiffi ffi 2 p À 1 ⪅ 0:828.Thus, in a situation without publication bias, self-correcting through publication alone could be possible even with up to a 82% false positive rate.This is in stark contrast to the current paradigm.

Discussion
The model outlined in this work is a simplification of an inherently complex system, but gives us some predictive insight provided certain reasonable assumptions hold.With current publication incentives, the modelling in this work strongly suggests there are situations where self-correction will be rendered impossible through publication alone.In fields where false positive rates are high, the likelihood of this outcome is increased, and exacerbated by several factors.The time taken for corrective work to begin is a perhaps surprising factor suggested by this investigation.While this might only marginally increase time taken to correction, it has outsized influence on the volume of research waste produced, and the sheer corrective effort required.This model suggests accordingly that the longer spurious results are ignored, the more subsequent research waste they produce.
The practice of research itself can mitigate this somewhat.Better research conduct and proper statistical analysis yields fewer spurious findings in the first instance.This aside, model results suggest that the file-drawer problem is a substantial one, and that researchers burying their null results is harmful to scientific enterprise.This goes too for the concealing of contradictory results or disconfirmatory findings.It is critical for healthy science that scientists submit null results, regardless of   royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 231056 the perceived status of doing so.Equally, null results should be weighed on par with positive findings.While an exciting result might suggest a research area to pursue, researchers must be mindful to also search for counter-evidence.This is currently not the case, evidenced by the sheer number of citations that retracted publications continue to accrue.But much of the blame for this should be shared with scientific publishing.The fixation on novel, positive results over diligent science creates a perverse incentive to favour even illusory positives over reliable nulls.While the file-drawer problem might be seen as a researcher-driven phenomenon, the reality is that it occurs in response to the publication pressure under which scientists must operate.Fixation on novelty alone is antithetical to healthy science, and journals as the gatekeepers of scientific publication need to acknowledge this and take corrective action if they are serious about upholding scientific trustworthiness.The removal of publication bias has the most stark effect in the model, radically reducing research waste and increasing trustworthiness of findings.This is within the power of journals to implement.Science is from a philosophical standpoint self-correcting, and journals should revise policies which inadvertently impede this.Academic culture itself shares in the blame, as misguided efforts to gauge a researcher's worth by publication metrics lend themselves not to better science but to questionable research practices beyond the scope of this paper [50][51][52].royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 231056 It is important to note the model outlined here has significant limitations.It is an approximation of complex dynamics, and does not account for factors that might be introduced outside of that simulated ecosystem.In many respects, the model has been overly optimistic about our ability to self-correct.Scientists are humans, subject to human biases.Confirmation bias is inevitable, and volume of evidence may not be enough to dislodge falsified ideas.There is considerable evidence that even if published, more negative findings do not obtain a high level of citation [53].This is partially accounted for in the parameter k in the model, attesting to the relative corrective effect of null findings, but even this might be overly optimistic.This can lead to a scenario where zombie findings still persist, despite refutation.This might mean that while technical self-correction is achieved, these zombie studies linger on, still creating future research waste.Overcoming this may require a fundamental retraining of researchers [54].
Another potential limitation of the current model, related to the last point, is that it does not yet consider the potential for 'walled-gardens' inside a subfield.For example, a concept may be refuted by high-quality studies, but the falsified idea may persist in a subdomain by citations to the debunked original.This has been seen empirically in certain fields for the trajectories of thoroughly refuted papers [55], and if common would impede corrective action further.How widespread this is remains unclear, but its net effect in the parlance of this model would be to reduce k even further, potentially to the extent that publication-driven self-correction of science is rendered impossible.A recent analysis by Sigurdson et al. [56] looks at the specific example of homeopathy, quantifying how the majority of homeopathy trials report positive outcomes for the treatment, despite its central tenets being physically impossible [57].Over half these studies contained demonstrable statistical errors, suggesting for fields such as this, walled-gardens might mean that no amount of subsequent investigation can dislodge intrinsic pathological science.
There is also a further issue in corrective science-it is incredibly difficult to motivate researchers to undertake such efforts.It has recently been suggested that researchers should get credit for their efforts to correct the literature [58], and this seems overdue.From personal experience, however, while it is possible to get corrections published in the literature [59,60], it is a common experience to find journals reluctant to engage with good-faith criticism, or even outright hostile to it.Correcting spurious work is a serious effort, and one that goes unrecognized and unrewarded in our current schema.This needs to change if we are serious about scientific quality and research integrity.The model outlined is a useful insight into how self-correction may or may not be achievable, and the factors shaping its efficacy.It also yields insights into how we might mitigate research waste and shape sustainable research in biomedicine.This remains, however, an understudied problem and one that demands much more research effort if we are to maintain trust in science, for both the scientific community and the wider public.
Ethics.This work did not require ethical approval from a human subject or animal welfare committee.Data accessibility.Data and relevant code for this research work are stored in GitHub: https://github.com/drg85/SpuriousScience and have been archived within the Zenodo repository at https://zenodo.org/doi/10.5281/zenodo.10047878 [61].
Declaration of AI use.We have not used AI-assisted technologies in creating this article.Authors' contributions.D.R.G.: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, validation, visualization, writing-original draft, writing-review and editing.

Figure 4
Figure 4  depicts the impact of researcher-driven interventions on the time taken for correction of spurious results and the degree of research waste generated.The higher the submitted proportion of negatives (s r → 1), the shorter the time to correction and less research waste generated.Similar patterns are observed when null and negative results are equally valued to positive or significant results by the research community (k → 1).Low values of k and s r lead to greater volumes of research waste, more corrective effort, and longer time to correction.For sufficiently low values of s r and k, g 2 ≥ 4de and consequently no correction via publication alone is possible.

Figure 1 .Figure 2 .Figure 3 .
Figure 1.Dynamics of publishing ecosystem for (a) low false positive rate, no corrective delay (b) low false positive rate, minor corrective delay, (c) high false positive rate, no corrective delay (d ) high false positive rate, 2-year corrective delay.Note the varying y-axis scales (denoting number of spurious publications released at a specific time) in all subfigures.The parameter t d is the corrective delay, the time lag before corrective efforts to counter spurious publications begin.

Figure 4 .
Figure 4.For a spurious result in a field with f p = 0.10 and t d = 1 year, (a) time to correction for a spurious result with varying values of s r and k with all other parameters kept constant.(b) The log 10 of research waste (r w ) with varying values of s r and k with all other parameters kept constant.Note the log scale on this figure, so that the contour line of 2 is equivalent to r w = 100, 3 corresponds to r w = 1000 etc. White areas depict regions where no corrective effect is possible by publication alone.

Figure 5 .
Figure 5. Dynamics of publishing ecosystem in the absence of publication bias for (a) low false positive rate, no corrective delay, (b) low false positive rate, minor corrective delay, (c) very high false positive rate, no corrective delay and (d) very high false positive rate, 2-year corrective delay.Note the varying y-axis scales in all subfigures.

Table 1 .
Model terms and definitions..org/journal/rsosR.Soc.Open Sci.11: 231056 dynamics of fields with varying false positive rates (f p ) and differing times for delaying corrective action, t d .2. Impacts of file-drawer problems on self-correction and corrective impact on research waste (researcher-driven interventions).To examine impacts researchers can have on the levels of research waste in a field, we can simulate the file-drawer problem by investigating effects of varying s r , the proportion of null results submitted.Similarly, we can also investigate the impacts of the value researchers place on null results relative to positive results by varying k, the relative corrective effect of null results.It is worth noting that these actions may not be entirely researcher driven, a nuance expanded upon in the Discussion section.3. Hypothetical impacts of eliminating publication bias on research waste and result trustworthiness ( publishing industry interventions). royalsocietypublishing

Table 2 .
Corrective time, research waste and corrective effort for scenarios in figure 1.

Table 3 .
Corrective time, research waste and corrective effort for scenarios in figure 5.